Multi-modal analysis for person type classification in news video
نویسندگان
چکیده
Classifying the identities of people appearing in broadcast news video into anchor, reporter, or news subject is an important topic in high-level video analysis. Given the visual resemblance of different types of people, this work explores multi-modal features derived from a variety of evidences, such as the speech identity, transcript clues, temporal video structure, named entities, and uses a statistical learning approach to combine all the features for person type classification. Experiments conducted on ABC World News Tonight video have demonstrated the effectiveness of the approach, and the contributions of different categories of features have been compared.
منابع مشابه
Finding Person X: Correlating Names with Visual Appearances
People as news subjects carry rich semantics in broadcast news video and therefore finding a named person in the video is a major challenge for video retrieval. This task can be achieved by exploiting the multi-modal information in videos, including transcript, video structure, and visual features. We propose a comprehensive approach for finding specific persons in broadcast news videos by expl...
متن کاملIntegrating multi-modal content analysis and hyperbolic visualization for large-scale news video retrieval and exploration
In this paper, we have developed a novel scheme to achieve more effective analysis, retrieval and exploration of large-scale news video collections by performing multi-modal video content analysis and synchronization. First, automatic keyword extraction is performed on news closed captions and audio channels to detect the most interesting news topics (i.e., keywords for news topic interpretatio...
متن کاملInteresting faces: A graph-based approach for finding people in news
In this study, we propose a method for finding people in large news photograph and video collections. Our method exploits the multi-modal nature of these data sets to recognize people and does not require any supervisory input. It first uses the name of the person to populate an initial set of candidate faces. From this set, which is likely to include the faces of other people, it selects the g...
متن کاملThe Segmentation and Classification of Story Boundaries in News Video
The segmentation and classification of news video into single-story semantic units is a challenging problem. This research proposes a two-level, multi-modal framework to tackle this problem. The video is analyzed at the shot and story unit (or scene) levels using a variety of features and techniques. At the shot level, we employ a Decision Tree to classify the shot into one of 13 pre-defined ca...
متن کاملCapturing text semantics for concept detection in news video
The overwhelming amounts of multimedia contents have triggered the need for automatic semantic concept detection. However, as there are large variations in the visual feature space, text from automatic speech recognition (ASR) has been extensively used and found to be effective to complement visual features in the concept detection task. Generally, there are two common text analysis methods. On...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005